eXtract: a snippet generation system for XML search
نویسندگان
چکیده
Snippets are used by almost every text search engine to complement ranking schemes in order to effectively handle user keyword search. Despite the fact that XML is a standard representation format of web data, research on generating result snippets for XML search remains untouched. In this work, we present eXtract, a system that efficiently generates self-contained result snippets within a given size bound which effectively summarize the query results and differentiate them from one another, according to which users can quickly assess the relevance of the query results.
منابع مشابه
Bridging the Gap between Intrinsic and Perceived Relevance in Snippet Generation
Snippet generation plays an important role in a search engine. Good snippets provide users a good indication on the main content of a search result related to the query and on whether one can find relevant information in it. Previous studies on snippet generation focused on selecting sentences that are related to the query and to the document. However, resulting snippet may look highly relevant...
متن کاملParsing the Wiki Collection and Snippet Generation A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Sai Subramanyam Chittilla IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE
Information Retrieval (IR) is a field which deals with retrieving useful information from large sets of data in response to a query. Much information in this digital age is stored in XML format, which associates a structure with a document. Though IR systems have been used for years to access documents, the field has greatly expanded with the emergence of the world wide web, which emphasizes th...
متن کاملFrom Focused Elements to Snippets A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Supraja Nagalla IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE
Information Retrieval is a field of computing which traditionally deals with searching a large collection of documents and retrieving documents based on their similarity to the query. INEX [10] provides a platform (e.g., document collection, queries and uniform evaluation metrics) for the development and evaluation of retrieval algorithms for XML documents. The focus of INEX is to reduce the gr...
متن کاملCompression of Semistructured Documents
EGOTHOR is a search engine that indexes the Web and allows us to search the Web documents. Its hit list contains URL and title of the hits, and also some snippet which tries to shortly show a match. The snippet can be almost always assembled by an algorithm that has a full knowledge of the original document (mostly HTML page). It implies that the search engine is required to store the full text...
متن کاملAutomatic Snippet Generation for Music Reviews
Review aggregator sites (RottenTomatoes.com, Metacritic.com) use snippets to convey the overall gist of the reviews they include in their coverage. These snippets are typically sentences extracted directly from the original review. In this paper we focus on snippet generation in the domain of music reviews—that is, how do you choose a snippet from a music review that best captures the opinion o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 1 شماره
صفحات -
تاریخ انتشار 2008